Overview

Dataset statistics

Number of variables39
Number of observations38123
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.3 MiB
Average record size in memory312.0 B

Variable types

BOOL25
NUM14

Reproduction

Analysis started2020-04-23 12:38:59.701008
Analysis finished2020-04-23 12:40:15.699553
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
installment is highly correlated with loan_amntHigh Correlation
loan_amnt is highly correlated with installmentHigh Correlation
term_ 60 months is highly correlated with term_ 36 monthsHigh Correlation
term_ 36 months is highly correlated with term_ 60 monthsHigh Correlation
annual_inc is highly skewed (γ1 = 31.18468597) Skewed
emp_length has 4542 (11.9%) zeros Zeros
delinq_2yrs has 33968 (89.1%) zeros Zeros
inq_last_6mths has 18503 (48.5%) zeros Zeros
pub_rec has 36142 (94.8%) zeros Zeros
revol_bal has 907 (2.4%) zeros Zeros
revol_util has 936 (2.5%) zeros Zeros

Variables

loan_amnt
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count871
Unique (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11215.7864
Minimum500
Maximum35000
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum500
5-th percentile2400
Q15500
median10000
Q315000
95-th percentile25000
Maximum35000
Range34500
Interquartile range (IQR)9500

Descriptive statistics

Standard deviation7403.880544
Coefficient of variation (CV)0.6601303091
Kurtosis0.7778054518
Mean11215.7864
Median Absolute Deviation (MAD)5866.200765
Skewness1.057600171
Sum427579425
Variance54817447.11
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 500. 975. 1025. 1175. 1225. ... 32125. 33975. 34100. 34900. 35000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10000 2760 7.2%
 
12000 2262 5.9%
 
5000 1964 5.2%
 
6000 1846 4.8%
 
15000 1835 4.8%
 
20000 1563 4.1%
 
8000 1536 4.0%
 
25000 1345 3.5%
 
4000 1090 2.9%
 
7000 989 2.6%
 
Other values (861) 20933 54.9%
 
ValueCountFrequency (%) 
500 5 < 0.1%
 
725 1 < 0.1%
 
750 1 < 0.1%
 
800 1 < 0.1%
 
900 2 < 0.1%
 
ValueCountFrequency (%) 
35000 629 1.6%
 
34800 2 < 0.1%
 
34675 1 < 0.1%
 
34525 1 < 0.1%
 
34475 5 < 0.1%
 

installment
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count14990
Unique (%)39.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean325.710319
Minimum15.69
Maximum1305.19
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum15.69
5-th percentile73.206
Q1167.97
median281.47
Q3431.37
95-th percentile764.559
Maximum1305.19
Range1289.5
Interquartile range (IQR)263.4

Descriptive statistics

Standard deviation208.7539831
Coefficient of variation (CV)0.6409191573
Kurtosis1.259144465
Mean325.710319
Median Absolute Deviation (MAD)162.5686474
Skewness1.130713193
Sum12417054.49
Variance43578.22548
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 15.69 30.375 30.975 31.23 32.435 ... 1094.695 1095.405 1109.1 1112.135 1305.19 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
311.11 68 0.2%
 
311.02 54 0.1%
 
180.96 53 0.1%
 
150.8 46 0.1%
 
368.45 45 0.1%
 
372.12 44 0.1%
 
339.31 42 0.1%
 
330.76 42 0.1%
 
317.72 41 0.1%
 
186.61 41 0.1%
 
Other values (14980) 37647 98.8%
 
ValueCountFrequency (%) 
15.69 1 < 0.1%
 
16.08 1 < 0.1%
 
16.25 1 < 0.1%
 
16.31 1 < 0.1%
 
16.47 1 < 0.1%
 
ValueCountFrequency (%) 
1305.19 1 < 0.1%
 
1302.69 1 < 0.1%
 
1295.21 1 < 0.1%
 
1288.1 2 < 0.1%
 
1283.5 1 < 0.1%
 

grade
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.575899064
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile5
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.385686989
Coefficient of variation (CV)0.5379430462
Kurtosis0.06214388089
Mean2.575899064
Median Absolute Deviation (MAD)1.148680751
Skewness0.7929293464
Sum98201
Variance1.920128431
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 2.5 3.5 4.5 5.5 6.5 7. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 11545 30.3%
 
1 9675 25.4%
 
3 7801 20.5%
 
4 5086 13.3%
 
5 2715 7.1%
 
6 993 2.6%
 
7 308 0.8%
 
ValueCountFrequency (%) 
1 9675 25.4%
 
2 11545 30.3%
 
3 7801 20.5%
 
4 5086 13.3%
 
5 2715 7.1%
 
ValueCountFrequency (%) 
7 308 0.8%
 
6 993 2.6%
 
5 2715 7.1%
 
4 5086 13.3%
 
3 7801 20.5%
 

emp_length
Real number (ℝ≥0)

ZEROS
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.96172914
Minimum0
Maximum10
Zeros4542
Zeros (%)11.9%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q39
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.561422497
Coefficient of variation (CV)0.7177784995
Kurtosis-1.362418774
Mean4.96172914
Median Absolute Deviation (MAD)3.111384028
Skewness0.2055088976
Sum189156
Variance12.6837302
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 7.5 8.5 9.5 10. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 8715 22.9%
 
0 4542 11.9%
 
2 4344 11.4%
 
3 4050 10.6%
 
4 3385 8.9%
 
5 3243 8.5%
 
1 3207 8.4%
 
6 2198 5.8%
 
7 1738 4.6%
 
8 1457 3.8%
 
ValueCountFrequency (%) 
0 4542 11.9%
 
1 3207 8.4%
 
2 4344 11.4%
 
3 4050 10.6%
 
4 3385 8.9%
 
ValueCountFrequency (%) 
10 8715 22.9%
 
9 1244 3.3%
 
8 1457 3.8%
 
7 1738 4.6%
 
6 2198 5.8%
 

annual_inc
Real number (ℝ≥0)

SKEWED
Distinct count5060
Unique (%)13.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69552.34513
Minimum4000
Maximum6000000
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum4000
5-th percentile24000
Q141159.5
median60000
Q383000
95-th percentile143003.6
Maximum6000000
Range5996000
Interquartile range (IQR)41840.5

Descriptive statistics

Standard deviation64462.95973
Coefficient of variation (CV)0.9268265449
Kurtosis2299.557497
Mean69552.34513
Median Absolute Deviation (MAD)30575.05953
Skewness31.18468597
Sum2651544053
Variance4155473177
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[4.0000000e+03 9.3000000e+03 9.9800000e+03 1.0010000e+04 1.1880000e+04 ... 3.5084998e+05 4.1750000e+05 7.9000000e+05 1.4010000e+06 6.0000000e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
60000 1456 3.8%
 
50000 1014 2.7%
 
40000 850 2.2%
 
45000 803 2.1%
 
75000 793 2.1%
 
65000 784 2.1%
 
30000 779 2.0%
 
70000 712 1.9%
 
48000 693 1.8%
 
80000 642 1.7%
 
Other values (5050) 29597 77.6%
 
ValueCountFrequency (%) 
4000 1 < 0.1%
 
4080 1 < 0.1%
 
4800 1 < 0.1%
 
5000 1 < 0.1%
 
5500 1 < 0.1%
 
ValueCountFrequency (%) 
6000000 1 < 0.1%
 
3900000 1 < 0.1%
 
2039784 1 < 0.1%
 
1900000 1 < 0.1%
 
1782000 1 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
1
32717
0
 
5406
ValueCountFrequency (%) 
1 32717 85.8%
 
0 5406 14.2%
 

dti
Real number (ℝ≥0)

Distinct count2854
Unique (%)7.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.3100139
Minimum0
Maximum29.99
Zeros166
Zeros (%)0.4%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile2.15
Q18.19
median13.4
Q318.57
95-th percentile23.82
Maximum29.99
Range29.99
Interquartile range (IQR)10.38

Descriptive statistics

Standard deviation6.66277884
Coefficient of variation (CV)0.5005839129
Kurtosis-0.8506276442
Mean13.3100139
Median Absolute Deviation (MAD)5.584425811
Skewness-0.02898642412
Sum507417.66
Variance44.39262187
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-03 1.9500e-01 1.6250e+00 4.0550e+00 ... 2.0695e+01 2.3265e+01 2.4995e+01 2.7995e+01 2.9990e+01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 166 0.4%
 
12 45 0.1%
 
18 44 0.1%
 
19.2 39 0.1%
 
13.2 38 0.1%
 
16.8 38 0.1%
 
12.48 37 0.1%
 
15 35 0.1%
 
14.29 35 0.1%
 
13.5 34 0.1%
 
Other values (2844) 37612 98.7%
 
ValueCountFrequency (%) 
0 166 0.4%
 
0.01 3 < 0.1%
 
0.02 5 < 0.1%
 
0.03 2 < 0.1%
 
0.04 3 < 0.1%
 
ValueCountFrequency (%) 
29.99 1 < 0.1%
 
29.95 1 < 0.1%
 
29.93 3 < 0.1%
 
29.92 2 < 0.1%
 
29.89 1 < 0.1%
 

delinq_2yrs
Real number (ℝ≥0)

ZEROS
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1469716444
Minimum0
Maximum11
Zeros33968
Zeros (%)89.1%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum11
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4925742554
Coefficient of variation (CV)3.351491761
Kurtosis39.8475732
Mean0.1469716444
Median Absolute Deviation (MAD)0.2619066085
Skewness5.036117222
Sum5603
Variance0.2426293971
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 4.5 6.5 11. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 33968 89.1%
 
1 3186 8.4%
 
2 660 1.7%
 
3 212 0.6%
 
4 58 0.2%
 
5 21 0.1%
 
6 10 < 0.1%
 
7 4 < 0.1%
 
8 2 < 0.1%
 
11 1 < 0.1%
 
ValueCountFrequency (%) 
0 33968 89.1%
 
1 3186 8.4%
 
2 660 1.7%
 
3 212 0.6%
 
4 58 0.2%
 
ValueCountFrequency (%) 
11 1 < 0.1%
 
9 1 < 0.1%
 
8 2 < 0.1%
 
7 4 < 0.1%
 
6 10 < 0.1%
 

inq_last_6mths
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8710227422
Minimum0
Maximum8
Zeros18503
Zeros (%)48.5%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.070673241
Coefficient of variation (CV)1.229213876
Kurtosis2.505123217
Mean0.8710227422
Median Absolute Deviation (MAD)0.8455018649
Skewness1.380687958
Sum33206
Variance1.146341189
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 4.5 5.5 6.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 18503 48.5%
 
1 10519 27.6%
 
2 5593 14.7%
 
3 2949 7.7%
 
4 309 0.8%
 
5 143 0.4%
 
6 60 0.2%
 
7 33 0.1%
 
8 14 < 0.1%
 
ValueCountFrequency (%) 
0 18503 48.5%
 
1 10519 27.6%
 
2 5593 14.7%
 
3 2949 7.7%
 
4 309 0.8%
 
ValueCountFrequency (%) 
8 14 < 0.1%
 
7 33 0.1%
 
6 60 0.2%
 
5 143 0.4%
 
4 309 0.8%
 

open_acc
Real number (ℝ≥0)

Distinct count40
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.315268998
Minimum2
Maximum44
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum2
5-th percentile3
Q16
median9
Q312
95-th percentile18
Maximum44
Range42
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.395012763
Coefficient of variation (CV)0.4718073911
Kurtosis1.702186306
Mean9.315268998
Median Absolute Deviation (MAD)3.434796094
Skewness1.007213396
Sum355126
Variance19.31613719
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 2. 2.5 3.5 4.5 5.5 ... 23.5 25.5 30.5 35.5 44. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
7 3869 10.1%
 
6 3786 9.9%
 
8 3777 9.9%
 
9 3597 9.4%
 
10 3091 8.1%
 
5 3033 8.0%
 
11 2651 7.0%
 
4 2235 5.9%
 
12 2206 5.8%
 
13 1838 4.8%
 
Other values (30) 8040 21.1%
 
ValueCountFrequency (%) 
2 547 1.4%
 
3 1408 3.7%
 
4 2235 5.9%
 
5 3033 8.0%
 
6 3786 9.9%
 
ValueCountFrequency (%) 
44 1 < 0.1%
 
42 1 < 0.1%
 
41 1 < 0.1%
 
39 1 < 0.1%
 
38 1 < 0.1%
 

pub_rec
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0537733127
Minimum0
Maximum4
Zeros36142
Zeros (%)94.8%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2350266352
Coefficient of variation (CV)4.370692885
Kurtosis25.41605262
Mean0.0537733127
Median Absolute Deviation (MAD)0.1019581391
Skewness4.644253189
Sum2050
Variance0.05523751925
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 4. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 36142 94.8%
 
1 1924 5.0%
 
2 47 0.1%
 
3 8 < 0.1%
 
4 2 < 0.1%
 
ValueCountFrequency (%) 
0 36142 94.8%
 
1 1924 5.0%
 
2 47 0.1%
 
3 8 < 0.1%
 
4 2 < 0.1%
 
ValueCountFrequency (%) 
4 2 < 0.1%
 
3 8 < 0.1%
 
2 47 0.1%
 
1 1924 5.0%
 
0 36142 94.8%
 

revol_bal
Real number (ℝ≥0)

ZEROS
Distinct count21240
Unique (%)55.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13420.93125
Minimum0
Maximum149588
Zeros907
Zeros (%)2.4%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile344.1
Q13727
median8901
Q317095
95-th percentile41675.8
Maximum149588
Range149588
Interquartile range (IQR)13368

Descriptive statistics

Standard deviation15908.51452
Coefficient of variation (CV)1.185351018
Kurtosis14.89510269
Mean13420.93125
Median Absolute Deviation (MAD)10273.86136
Skewness3.193246047
Sum511646162
Variance253080834.1
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 2.650000e+01 5.005000e+02 3.450500e+03 ... 6.457050e+04 8.200950e+04 1.016345e+05 1.206270e+05 1.495880e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 907 2.4%
 
298 14 < 0.1%
 
255 14 < 0.1%
 
1 11 < 0.1%
 
682 10 < 0.1%
 
865 9 < 0.1%
 
1763 9 < 0.1%
 
798 9 < 0.1%
 
39 9 < 0.1%
 
1159 9 < 0.1%
 
Other values (21230) 37122 97.4%
 
ValueCountFrequency (%) 
0 907 2.4%
 
1 11 < 0.1%
 
2 5 < 0.1%
 
3 6 < 0.1%
 
4 3 < 0.1%
 
ValueCountFrequency (%) 
149588 1 < 0.1%
 
149527 1 < 0.1%
 
149000 1 < 0.1%
 
148829 1 < 0.1%
 
148804 1 < 0.1%
 

revol_util
Real number (ℝ≥0)

ZEROS
Distinct count1087
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48.90457335
Minimum0
Maximum99.9
Zeros936
Zeros (%)2.5%
Memory size298.0 KiB

Quantile statistics

Minimum0
5-th percentile2.7
Q125.5
median49.4
Q372.4
95-th percentile93.6
Maximum99.9
Range99.9
Interquartile range (IQR)46.9

Descriptive statistics

Standard deviation28.32566565
Coefficient of variation (CV)0.5792027966
Kurtosis-1.104387831
Mean48.90457335
Median Absolute Deviation (MAD)24.2330198
Skewness-0.03637014169
Sum1864389.05
Variance802.3433346
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 5.000e-03 7.500e-02 1.100e-01 1.800e-01 ... 8.849e+01 9.443e+01 9.448e+01 9.785e+01 9.990e+01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 936 2.5%
 
0.2 62 0.2%
 
63 61 0.2%
 
40.7 58 0.2%
 
66.7 56 0.1%
 
61 56 0.1%
 
0.1 56 0.1%
 
37.6 56 0.1%
 
70.4 55 0.1%
 
31.2 55 0.1%
 
Other values (1077) 36672 96.2%
 
ValueCountFrequency (%) 
0 936 2.5%
 
0.01 1 < 0.1%
 
0.03 1 < 0.1%
 
0.04 1 < 0.1%
 
0.05 1 < 0.1%
 
ValueCountFrequency (%) 
99.9 24 0.1%
 
99.8 23 0.1%
 
99.7 29 0.1%
 
99.6 20 0.1%
 
99.5 24 0.1%
 

total_acc
Real number (ℝ≥0)

Distinct count82
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.11979645
Minimum2
Maximum90
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum2
5-th percentile7
Q114
median20
Q329
95-th percentile43
Maximum90
Range88
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.40033683
Coefficient of variation (CV)0.5153906753
Kurtosis0.6852883055
Mean22.11979645
Median Absolute Deviation (MAD)9.062718852
Skewness0.8253969368
Sum843273
Variance129.9676798
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 2. 2.5 3.5 4.5 5.5 ... 54.5 62.5 63.5 67.5 90. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
16 1414 3.7%
 
15 1402 3.7%
 
17 1392 3.7%
 
14 1391 3.6%
 
20 1383 3.6%
 
18 1374 3.6%
 
21 1343 3.5%
 
13 1333 3.5%
 
19 1296 3.4%
 
12 1272 3.3%
 
Other values (72) 24523 64.3%
 
ValueCountFrequency (%) 
2 4 < 0.1%
 
3 169 0.4%
 
4 397 1.0%
 
5 522 1.4%
 
6 651 1.7%
 
ValueCountFrequency (%) 
90 1 < 0.1%
 
87 1 < 0.1%
 
81 1 < 0.1%
 
80 1 < 0.1%
 
79 2 < 0.1%
 

fico_average
Real number (ℝ≥0)

Distinct count36
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean716.7168376
Minimum627
Maximum827
Zeros0
Zeros (%)0.0%
Memory size298.0 KiB

Quantile statistics

Minimum627
5-th percentile667
Q1687
median712
Q3742
95-th percentile782
Maximum827
Range200
Interquartile range (IQR)55

Descriptive statistics

Standard deviation35.71008725
Coefficient of variation (CV)0.04982454071
Kurtosis-0.5552860408
Mean716.7168376
Median Absolute Deviation (MAD)29.72249728
Skewness0.465389352
Sum27323396
Variance1275.210332
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[627. 647. 664.5 674.5 679.5 ... 784.5 799.5 809.5 814.5 827. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
702 2049 5.4%
 
687 2030 5.3%
 
697 1982 5.2%
 
692 1967 5.2%
 
682 1959 5.1%
 
722 1778 4.7%
 
707 1769 4.6%
 
677 1757 4.6%
 
727 1723 4.5%
 
717 1710 4.5%
 
Other values (26) 19399 50.9%
 
ValueCountFrequency (%) 
627 1 < 0.1%
 
632 1 < 0.1%
 
662 1377 3.6%
 
667 1572 4.1%
 
672 1613 4.2%
 
ValueCountFrequency (%) 
827 2 < 0.1%
 
822 17 < 0.1%
 
817 24 0.1%
 
812 115 0.3%
 
807 172 0.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
21178
1
16945
ValueCountFrequency (%) 
0 21178 55.6%
 
1 16945 44.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
38120
1
 
3
ValueCountFrequency (%) 
0 38120 > 99.9%
 
1 3 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
38027
1
 
96
ValueCountFrequency (%) 
0 38027 99.7%
 
1 96 0.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
35315
1
 
2808
ValueCountFrequency (%) 
0 35315 92.6%
 
1 2808 7.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
19852
1
18271
ValueCountFrequency (%) 
0 19852 52.1%
 
1 18271 47.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
21732
1
16391
ValueCountFrequency (%) 
0 21732 57.0%
 
1 16391 43.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
28461
1
9662
ValueCountFrequency (%) 
0 28461 74.7%
 
1 9662 25.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
26053
1
12070
ValueCountFrequency (%) 
0 26053 68.3%
 
1 12070 31.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
36640
1
 
1483
ValueCountFrequency (%) 
0 36640 96.1%
 
1 1483 3.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
33179
1
 
4944
ValueCountFrequency (%) 
0 33179 87.0%
 
1 4944 13.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
20158
1
17965
ValueCountFrequency (%) 
0 20158 52.9%
 
1 17965 47.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
37811
1
 
312
ValueCountFrequency (%) 
0 37811 99.2%
 
1 312 0.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
35271
1
 
2852
ValueCountFrequency (%) 
0 35271 92.5%
 
1 2852 7.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
37764
1
 
359
ValueCountFrequency (%) 
0 37764 99.1%
 
1 359 0.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
36018
1
 
2105
ValueCountFrequency (%) 
0 36018 94.5%
 
1 2105 5.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
37460
1
 
663
ValueCountFrequency (%) 
0 37460 98.3%
 
1 663 1.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
37567
1
 
556
ValueCountFrequency (%) 
0 37567 98.5%
 
1 556 1.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
34359
1
 
3764
ValueCountFrequency (%) 
0 34359 90.1%
 
1 3764 9.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
38028
1
 
95
ValueCountFrequency (%) 
0 38028 99.8%
 
1 95 0.2%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
36374
1
 
1749
ValueCountFrequency (%) 
0 36374 95.4%
 
1 1749 4.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
37774
1
 
349
ValueCountFrequency (%) 
0 37774 99.1%
 
1 349 0.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
37196
1
 
927
ValueCountFrequency (%) 
0 37196 97.6%
 
1 927 2.4%
 

term_ 36 months
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
1
28234
0
9889
ValueCountFrequency (%) 
1 28234 74.1%
 
0 9889 25.9%
 

term_ 60 months
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size298.0 KiB
0
28234
1
9889
ValueCountFrequency (%) 
0 28234 74.1%
 
1 9889 25.9%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

loan_amntinstallmentgradeemp_lengthannual_incloan_statusdtidelinq_2yrsinq_last_6mthsopen_accpub_recrevol_balrevol_utiltotal_accfico_averagehome_ownership_MORTGAGEhome_ownership_NONEhome_ownership_OTHERhome_ownership_OWNhome_ownership_RENTverification_status_Not Verifiedverification_status_Source Verifiedverification_status_Verifiedpurpose_carpurpose_credit_cardpurpose_debt_consolidationpurpose_educationalpurpose_home_improvementpurpose_housepurpose_major_purchasepurpose_medicalpurpose_movingpurpose_otherpurpose_renewable_energypurpose_small_businesspurpose_vacationpurpose_weddingterm_ 36 monthsterm_ 60 months
05000.0162.8721024000.0127.650.01.03.00.013648.083.79.0737.0000010010100000000000010
12500.059.833030000.001.000.05.03.00.01687.09.44.0742.0000010101000000000000001
22400.084.3331012252.018.720.02.02.00.02956.098.510.0737.0000011000000000000010010
310000.0339.3131049200.0120.000.01.010.00.05598.021.037.0692.0000010100000000001000010
45000.0156.461336000.0111.200.03.09.00.07963.028.312.0732.0000010100000000000000110
57000.0170.083847004.0123.510.01.07.00.017726.085.611.0692.0000011000010000000000001
63000.0109.435948000.015.350.02.04.00.08221.087.54.0662.0000010101000000000000010
75600.0152.396440000.005.550.02.011.00.05210.032.613.0677.0000100100000000000010001
85375.0121.452015000.0018.080.00.02.00.09279.036.53.0727.0000010010000000001000001
96500.0153.453572000.0116.120.02.014.00.04032.020.623.0697.0000101000010000000000001

Last rows

loan_amntinstallmentgradeemp_lengthannual_incloan_statusdtidelinq_2yrsinq_last_6mthsopen_accpub_recrevol_balrevol_utiltotal_accfico_averagehome_ownership_MORTGAGEhome_ownership_NONEhome_ownership_OTHERhome_ownership_OWNhome_ownership_RENTverification_status_Not Verifiedverification_status_Source Verifiedverification_status_Verifiedpurpose_carpurpose_credit_cardpurpose_debt_consolidationpurpose_educationalpurpose_home_improvementpurpose_housepurpose_major_purchasepurpose_medicalpurpose_movingpurpose_otherpurpose_renewable_energypurpose_small_businesspurpose_vacationpurpose_weddingterm_ 36 monthsterm_ 60 months
381135000.0159.7722180000.0111.930.01.016.00.060568.039.238.0717.0100001000000100000000010
381145000.0161.252448000.018.030.01.06.00.028329.048.66.0707.0100001000010000000000010
381155000.0164.233080000.011.210.03.015.01.027185.016.129.0672.0000101000100000000000010
381165000.0155.381185000.010.310.00.07.00.0216.00.619.0787.0000101000100000000000010
381175000.0158.302575000.0115.550.00.010.00.066033.023.029.0757.0100001000100000000000010
381182500.078.4214110000.0111.330.00.013.00.07274.013.140.0762.0100001000000100000000010
381198500.0275.383318000.016.401.01.06.00.08847.026.99.0692.0000011000100000000000010
381205000.0156.8410100000.012.300.00.011.00.09698.019.420.0742.0100001000010000000000010
381215000.0155.3810200000.013.720.00.017.00.085607.00.726.0812.0100001000000000001000010
381227500.0255.435022000.0114.291.00.07.00.04175.051.58.0662.0000101000010000000000010